Comparative Evaluation of Modular Automatic Summarisation Systems Using Cast
نویسنده
چکیده
The information overload faced by today’s society poses great challenges to researchers who want to find a relevant piece of information. Automatic summarisation is a field of computational linguistics which can help humans to deal with this information overload by automatically extracting the gist of documents. This thesis attempts to gain insights into the automatic summarisation field from several different angles. First, it performs qualitative, quantitative and comparative evaluations of different automatic summarisation methods. These summarisation methods are built around a term-based summariser which is then augmented with additional linguistic information which includes lexical, semantic and discourse information. On the basis of these evaluations, it was noticed that the choice of modules which provide low-level linguistic information (e.g. morphological processors) does not influence the results significantly, but higher level linguistic information, such as anaphora resolution and shallow information about discourse structure, leads to significant improvements of the summaries. In order to have a comprehensive view of how good summaries produced by a given method are, the evaluation performed in this thesis measures both the informativeness of the summaries produced and the quality of their discourse structure. Moreover, a method which determines the upper limit for informativeness is proposed to demonstrate the limits of extraction techniques. Comparison between the informativeness and the quality of discourse reveals no correlation between them. A third direction pursued in this research is to replace conventional iterative extraction methods, which extract one sentence at a time without considering the
منابع مشابه
CAST: A computer-aided summarisation tool
In this paper we propose computeraided summarisation (CAS) as an alternative approach to automatic summarisation, and present an ongoing project which aims to develop a CAS system. The need for such an alternative approach is justified by the relatively poor performance of fully automatic methods used in summarisation. Our system combines several summarisation methods, allowing the user of the ...
متن کاملEvaluating Summarisation Technologies: A Task Oriented Approach
This paper presents a novel task-oriented approach for the evaluation of automatic text summarisation systems. Evaluation of systems has traditionally been a troublesome area in summarisation research. We propose a scheme that evaluates three existing systems by determining their relative effectiveness in an interactive search task, under conditions that approximate the intended use of the syst...
متن کاملExperiments in Newswire Summarisation
In this paper, we investigate extractive multi-document summarisation algorithms over newswire corpora. Examining recent findings, baseline algorithms, and state-of-the-art systems is pertinent given the current research interest in event tracking and summarisation. We first reproduce previous findings from the literature, validating that automatic summarisation evaluation is a useful proxy for...
متن کاملComparing Lexical Chain-based Summarisation Approaches Using an Extrinsic Evaluation
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chainbased summarisation systems. We also compare the chain scoring and extraction techniques...
متن کاملThe limits of automatic summarisation according to ROUGE
This paper discusses some central caveats of summarisation, incurred in the use of the ROUGE metric for evaluation, with respect to optimal solutions. The task is NPhard, of which we give the first proof. Still, as we show empirically for three central benchmark datasets for the task, greedy algorithms empirically seem to perform optimally according to the metric. Additionally, overall quality ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006